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INTRODUCTION  TO  NON-LINEAB  E3TIKATI0NS 

Fundamental  to  the  field  of  curve   fitting  in  which  estimation 
of  non-linear  parameters  is  involved  ia  the  idea  of  a  "heat  fit." 
To  obtain  the  "best  fit",  the  principle  of  least  squares  is  many 
times  used.  This  principle  underlies  the  research  in  this  paper. 

However,  three  other  methods  of  curve  fitting  are  mentioned 
in  this  report.  The  first  is  the  obvious  one  of  merely  plotting 
the  points  on  a  graph,  but  this  method  fails  when  the  number  of 
dimensions  is  greater  than  two. 

A  method  of  grouping  data  into  n  groups  and  then  using  the 
average  of  each  group  as  one  of  the  n  observations  will  result  In 
n  equations  in  n  variables  and,  in  general,  a  unique  solution. 
This  method  is  called  averaging.  Personal  judgment  can  influence 
decisions.  Thus,  the  division  of  ten  observations  into  three 
groups  involves  some  arbitrariness. 

A  third  method  is  the  principle  of  selected  points,  which  is 
exactly  what  the  name  implies.  One  chooses  n  representative  points 
and  solves  a  resulting  set  of  n  equations  to  determine  the  n 
constants  or  parameters.  The  results  obtained  using  this  method 
would  seem  to  be  primarily  a  function  of  the  points  chosen,  and 
this  is  actually  the  case.  Regardless  of  this  shortcoming,  one 
would  gain  some  idea  of  the  values  of  the  parameters  if  he  is 
careful  in  the  selection  of  the  points.  This  idea  of  a  starting 
value  or  a  reasonably  good  estimate  is  basic  to  the  Gauss-Newton 
iteration  procedure  of  estimating  non-linear  parameters. 

Basic  to  Regression  Analysis  is  the  idea  of  knowing  what 
family  of  ourves  to  attempt  to  fit  to  a  given  set  of  data.  In 


other  words,  the  form  of  the  elation  and  the  parameters  must  be 
known  before  there  can  be  an  attempt  to  estimate  the  parameters* 

Of  first  importance  are  the  polynomial  equations*  If  the 
observations  are  such  that  the  x  steps  or  changes, ^x,  are  equal, 
and  the  i   difference  (zfy)  o*  7   1»  approximately  constant,  then 
an  1   degree  polynomial  would  be  a  good  choice  for  a  curve  to 
which  to  fit  the  data*  For  example,  if  the  Ax's  are  equal  and 

2   * 

£y  a  constant,  then  one  should  attempt  to  fit  the  data  to  the 

2 

family  of  the  form  y  a  Ax  +  Bx  ♦  0* 

One  should  note  the  delusion  involved  in  increasing  i,  the 
degree  of  the  polynomial*  As  1  approaches  n,  the  error  sum  of 
squares  will  decrease,  since  for  i  a  n,  the  error  sum  of  squares 
equals  0,  because  n  points  will  determine  the  n  parameters  or 
variables*  As  one  increases  the  degree  of  the  polynomial,  the 
number  of  parameters  being  estimated  increases,  and,  hence, 
the  degrees  of  freedom  decrease*  Comparisons  in  curve  fittings 
are  made  only  when  the  degreec  of  freedom  in  the  respective 
fittings  are  the  same*  Of  importance  also  is  the  idea  that  one 
is  fitting  to  a  curve  an  approximation  that  fluctuates  everytlme 
there  is  an  error  in  the  measurement  or  the  recording  of  the 
measurement  or  a  malfunction  in  the  experimental  apparatus. 

There  are  at  least  three  basic  families  of  curves  wh&ofc 
are  of  import anoe  to  industry  and  science  that  involve  non-linear 
parameters*  The  three  are  based  on  the  exponential,  the  power, 
and  the  hyperbolic  functions* 

To  help  one  decide  whether  to  attempt  to  fit  a  given  set  of 
data  to  an  exponential  function  of  the  form  y  ■  abx,  one  can  make 
a  simple  check  which  is  derived  froa  the  properties  of  this  family 


of  curves.  When  x  is  increase <J  by  one,  y  is  multiplied  by  b. 
Therefore,  if  the  x*s  are  in  an  arithmetic  progression  and  the 
corresponding  y's  form  a  geometric  progression,  the  data  is  such 
that  it  could  be  closely  approximated  by  means  of  an  exponential 
function.  For  the  modified  exponential  y  =  a  +  bcx,  where  the  x's 
are  in  an  arithmetic  progression,  the  changes  in  y  (Ay)  form  a 
geometric  progression  although  the  y's  themselves  do  not. 

Tor  the  power  function  y  =  axb  where  the  x's  are  in  a  geo- 
metric progression,  the  y's  will  also  form  a  geometric  progression. 
For  the  case  of  the  modified  power  function  y  =  a  +  bx°  where  the 
x's  are  in  a  geometrio  progression,  the  y's  will  form  a  geometric 
progression  although  the  y's  themselves  do  not  (10,  p.  324). 

For  the  hyperbolic  equation  of  the  form  y  =  a  +  b/x,  the 
expression  ay/aO/x)  is  a  constant  because  in  the  space  with  axes 
of  1/x  and  y,  the  equation  is  linear  with  slope  Ay/A(l/x).  For  a 
hyperbolic  equation  of  the  form  y  =  x/(a  4  bx),  one  can  deduoe 
that  A(x/y)  /  ax  is  equal  to  a  constant,  by  reasoning  similar  to 
the  above. 

One  could  find  requirements  for  many  different  families  of 
equations;  however,  those  that  are  given  seem  to  be  the  ones  most 
commonly  used  and  should  serve  as  examples  in  case  others  are 
needed. 


THE  GAUSS-NWTON  METHOD 

Seemingly  the  idea  of  least  square  was  first  formulated  by 
Legendre  about  1806  (12,  p.  210),  He  said  that  the  most  satis- 
factory solution  to  a  set  of  a  linear  equations  In  n  variables, 
with  m  greater  than  n,  was  the  one  giving  a  minimum  to  the  sum  of 

♦  to 

squares  of  error.     If  the  1       equation  is  of  the  form  (2.1), 
(2,1)     jx  c  a,xu  ♦  a2r21  ♦  ...   ♦  a^  (i  =  1  •  ig   ....  *) 

then  the  function  to  be  minimised  is  ?(afcj  k  «  1t  2«..*»  Q)  = 
H  E?,  where  ^  is  given  by 

*i  *  '1*11  ♦  V21  *  —  *  Vhl  "  Xl         (1  "  U  ' mU 

The  minimization  can  be  done  by  setting  the  n  first  order  partial 
derivatives  irith  respect  to  the  ak  of  *(*k»  *  n  *•  2»»»«t  nJ 

equal  to  zero.  A  system  of  n  equations  in  n  variables  is  obtained. 
The  equations  are  called  the  normal  equations.  Since  it  can  be 
shown  that  the  coefficient  matrix  obtained  in  this  manner  has 
rank  n  when  the  x's  are  linearly  independent,  there  is  a  unique 
solution  of  the  system. 

The  development  of  the  normal  equations  is  given  In  part  A 
of  the  appendix.  One  can  form  the  first  normal  equate    ;r  a  set 
of  m  equations  in  the  following  way:  Multiply  the  first  equation 
by  the  coefficient  of  the  first  variable  of  the  first  equation, 
multiply  the  second  equation  by  the  coefficient  of  the  first 
variable  in  the  second  equation,  and  so  on  for  all  m  equations; 
then  add  tog  ther  the  computed  multiples  of  the  original  set  of 
equations,  and  the  result  is  the  first  nomal  equation  (see  part  B 


In  the  appendix)*  By  using  the  coefficients  of  the  second  vari- 
able, one  can  obtain  the  second  normal  equation  and*  in  similar 
manner,  form  all  n  normal  equations*  One  should  note  that  the 
coefficient  matrix  of  equations  obtained  in  this  way  will  be  sym- 
metric. This  simplifies  considerably  the  amount  of  work  involved 
in  setting  up  the  system  and  enables  one  to  use  results  from  the 
theory  of  symmetric  matrices* 

The  method  of  least  squares  is  easily  used  whenever  all  the 
parameters  one  wishes  to  estimate  are  involved  linearly  as  the 
coefficients  of  terms  in  a  function*  Whenever  a  parameter  is 
involved  as  an  exponent,  such  as  the  parameter  b  in  the  function 
y  «  axb,  the  method  thus  illustrated  leads  to  a  system  of  equations 
for  whioh  the  solution  is  very  diffioult  to  find*  Most  statisti- 
cians are  well  acquainted  with  a  transformation  which  will  change 
the  above  power  function  into  a  form  such  that  the  regular  method 
of  least  squares  can  be  easily  used;  but  an  important  aspeot  of 
the  situation  is  that  few  users  of  this  teohnlque  realise  its 
short oomings* 

The  technique  of  ohanging  the  form  to  one  which  Involves  a 
function  of  a  and  a  function  of  b  linearly,  is  that  of  taxing  the 
logarithm  of  both  sides*  The  resulting  equation,  log  y  ■  log  a  ♦ 
b  log  x  (T  «  A  ♦  bX),  is  then  linear  in  log  a  and  b  and  can  be 
fitted  by  the  regular  least  squares  method.  In  fact,  in  the  space 
with  distance  measured  by  log  y  and  log  x,  one  is  guaranteed  a 
minimum  for  the  error  sum  of  squares* 

Consider  now  an  example  of  fitting  the  data  given  in  Table  1 
to  a  curve  of  the  family  y  =  ax  by  this  technique* 


Table  1*     Data  for  Power  Function  Pitting 


x          y            Xsslog  x  *=19§  7                * XT 

1  2.5              0.  .3979  .0                      .0 

2  8.                    .3010  .9031  .0906                  .2718 

3  19.                    .4771  1.2788  .2276                 .6101 

4  50.                    .6021  1.6990  .3625                1.0230 

rxisl.3802  2IY«4.2788  £X2».6807      £Xr»1.9049 


The  normal  equations  are  as  follows: 

4.0000  A  ♦  1.3802  b  «  4.2788 
1.3802  A  ♦  0.6807  b  *  1.9049 

The  solution  is  that  b  ■  2.096  and  A  as  0.3472,  hence,  a  sj  2.224. 
The  required  equation  is  then  y  »  2.224x  '°  '  .     Table  2  shews  the 
error  sum  of  squares  to  be  equal  to  100.202. 

Table  2.     Calculations  to  find  the  Error  Sum  of  Squares 


...■.■■■      ■ ■■■ ■■■  I..  «■■ ~^» 

x      y( calculated)     y (observed)     E  fi 


1  2,224               2.5  .276  .076 

2  9.510               8.  -1.510  2.280 

3  22.243  19.  -3.243  10.517 

4  40.655  50.  9.345  87.329 

rfxsl  00.202 


Now  for  contrast,  consider  what  happens  if  one  selects  the 
member  of  this  family  which  passes  through  the  last  two  points. 
This  requires  that  19  «  a3  and  50  m  a4  •  Then  solving  for  b 
gives  b  s  (log  50  -  log  19)  •   (log  4  -  log  3)  ■  3.36  and,  there- 
fore, a  b  0.474.  When  one  evaluates  the  error  sum  of  squares, 
one  finds  a  somewhat  surprising  value  of  only  13.93.  This  is 
considerably  smaller  than  the  "alnliraml " 


One  needs  to  examine  more  closely  what  happens  when  one  uses 
the  logarithmic  trans formation.  After  using  the  transformation, 
one  Is  then  minimizing  the  error  sum  of  squares  of  the  logarithms 
of  the  deviations  and  net  the  error  sum  of  squares  of  the  devia- 
tions themselves.  Logarithms  are  such  that  if  the  logarithms  of 
two  small  numhers  differ  by  C.I,  then  the  numbers  can  still  be 
close  together;  whereas,  the  same  difference  in  the  logarithms  of 
larger  numbers  may  correspond  to  a  much  larger  difference  in  the 
numbers.  Since  for  the  error  sum  of  squares  one  is  Interested  In 
the  differences  of  the  numbers,  one  would  need  to  weight  the  loga- 
rithmic values  to  obtain  the  desired  results.  If  one  evaluates 
the  error  sum  of  squares  of  the  logarithms  of  deviations  in  the 
first  least  squares  fitting,  one  finds  a  value  of  0.02093  while 
the  corresponding  number  for  the  selected  points  fit  is  consider- 
ably higher  at  0.56838. 

A  challenging  problem  has  now  arisen.  By  selecting  two 
points  through  which  to  force  the  two  parameter  family  of  curves, 
one  has  obtained  a  better  fit  by  the  least  squares  criterion  than 
one  obtained  by  using  a  logarithmic  transformation  followed  by  a 
fitting  of  the  transformed  variables  by  least  squares.  Since  one 
could  have  picked  any  combination  of  two  points  from  the  given 
set,  this  solution  would  have  only  a  small  probability  of  being 
the  one  that  would  give  a  minimum  to  the  error  sum  of  squares. 

It  seems  that  no  simple  one-step  method  exists  that  will 
guarantee  the  desired  minimum j  however,  there  are  iterative 
procedures  that  will  guarantee  a  reduction  in  the  error  sum  of 
squares  such  that  repetition  of  the  process  will  enable  the  error 
sum  of  squares  to  approach  a  minimum,  since  the  error  sum  of 
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squares  is  bounded  from  below.  The  method  used  in  general  prac- 
tice to  estimate  non-linear  parameters  is  now  outlined*  It  is 
inewn  as  the  Sauss-Sewton  Method. 

Assume  that  the  system  to  be  satisfied*  as  nearly  as  possible* 

(2.2)  E.(x,y;a)  =  0,  E2(x,yja)  m  0**..,  En(x.yia)  m  0 

is  the  system  of  equations  (2*2);  where  x  represents  the  independent 
variables*  y  represents  the  response  variables*  and  a  represents 
the  parameters*  Assume  also  that  an  approximate  solution  of  the 
system  is  known*  This  last  assumption  is  not  unreasonable  because 
an  approximate  solution  can  be  obtained  by  determining  the  values 
of  the  parameters  for  the  curve  through  a  particular  set  of  n 
points  by  the  selected  points  method  as  was  done  earlier* 

Tor  any  particular  set  of  values  of  x_,  (Ic  =  1*  29**«,  n) 
and  the  corresponding  value  of  y.  f  the  value  of  JS.  is  a  function 
of  the  parameters.  For  the  purpose  of  Illustration,  consider  the 
case  with  two  parameters*  Represent  these  parameters  by  a  and  b 
with  the  beginning  approximations  of  them  by  aQ  and  bQ. 

If  S  is  a  funotion  which  can  be  represented  by  a  Xaylor 

series*  then  the  system  represented  by  (2*3)  can  be  obtained* 

Each  partial  derivative  is  evaluated  at  the  starting  values; 

a  =  a  t  b  =  b_* 
o*      o 

(2.3)  E,(a.b)  m   S  (a  ,b  )  4  (a  •  a  )<5E±  +  (b  -  b  )3E.  + 

I  ["(  a  -  a0)23?B1  4  2(a  -  ae)(b  -  \>0)  fax  +  (b  -  b0)232E,]   +  .*. 
L  JT  daTb  3b^J 

How.  the  equations  of  the  system  (2. A)  are  linear  in  +v"* 
unknown  corrections  (a  -  a0)  and  (b  -  b   )  of  the  parameters.     One 


Is  able  then  to  use  the  method  of  least  squares  previously  devel- 
oped to  estimate  the  adjust   tl  to  the  starting  values  In  place 
of  estimating  the  true  values  themselves.  Qui  solution  of  the 
normal  equations  formed  fro?  (2.4)  can  he  thought  of  as  fol?-ows, 

f  a  30  •►  BgSj  +  B222 

where  I*   is  the  partial  derivative  of  f  with  respect  to  the  1 
variablei  and  the  3's  are  the  estimates  of  the  adjustments.  Since 
L1  is  estimating  the  quantity  (a  -  a0),  or  B.  a  a  -  aQt   then 
a  =  a  v  3. .  Iron  this,  one  obtains  a  new  starting  value  of  the 
parameter  &«  Similarly,  one  obtains  new  starting  values  for  the 
other  parameters.  How  close  one  actually  gets  to  the  true  value 
of  the  parameters  depends  on  the  original  starting  values  them- 

Ives  for,  if  Ifttf  ^re  too  ausb  la  error,  neglect  of  higher  order 
terms  of  the  adjustments  is  not  justified.  Hence,  an  interpreta- 
tion of  the  solution  could  definitely  be  misleading. 

Continue  now  the  worK  with  the  previous  example  to  see  if  an 
improvement  in  fitting  the  data  of  Table  1  to  the  family  of  curves 
y  m  ax*1  can  be  made  by  using  the  Gauss-Uewton  method  of  estimating 
non-linear  parameters.  'Jy  using  the  starting  values  of  a.  and  b 
obtained  when  passing  the  curve  exactly  tLrough  the  third  and 
fourth  points,  the  following  equations  and  first  order  expansions 
are  obtained.  Tor   simplification  in  writing  let  u  s  (a  -  a0)  a 
(a  -  .474)  and  v  =  (b  -  b0)  s  (b  -  3.36). 

t J  at  a  -  2.5  fj  as  -2.026  ♦  u  *  0 

f2  a  a2b  -  8.0  f2  a  -3.135  ♦  10.267U  ♦  3.372V  a  0 

f5  a  a3to  -  !9.0  f3  =  40.098U  ♦  20.874v  a  0 

fA  a  a4*  -  50.0  th   a        105.41 1u  ♦  69.314v  a  0 
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The  restating  normal  equations  are* 

12,825.740  u  ♦  8,179.084  v  m   34.213 
8,179.084  u  ♦  5,251.525  t  ■  10.571 

The  corresponding  solution  is  u  a  0.197  and  v  *  -0,305.  Conse- 
quently, the  revised  estimates  of  a.  and  b  are  as  follows: 

a  *  0.474  ♦  0.197  »  0.671 
b  sj  3.36  -  0.305  a  3.055 

For  the  function  y  a  0.671x^'°  ,  the  error  sum  of  squares 
Is  22.628  which  is  better  than  that  obtained  by  the  logarithmic 
fit  but  still  is  not  as  good  as  the  fit  obtained  by  the  seleeted 
points  methods.  Remember,  this  is  an  iterative  procedure  with  the 
results  a  function  of  the  starting  values.  A  second  iteration 
where  the  functions  are  expanded  about  the  adjusted  values  of  the 
parameters  gives  the  following  results t  a  «  0.733,  b  *  3.039,  and 
an  error  sum  of  squares  equal  to  10.052.  this  is  smaller  than  any 
previously  obtained  error  sum  of  squares.  There  is  no  reason  to 
stop  here  if  one  wishes  to  obtain  the  minimum.  One  should  continue 
until  the  improvement  in  the  error  sum  of  squares  between  successive 
fittings  is  sufficiently  small.  However,  this  suffices  to  show 
the  improved  results  and  how  to  use  the  technique. 


tt 


MODIFICATIONS 

An  extension  of  the  standard  method  of  estimating  non-linear 
parameters  was  developed  by  Kenneth  Levenberg  (8,  p.  164)  of  the 
Frankford  Arsenal,  His  method  insures  an  improvement  of  the  ini- 
tial solution.  One  might  appropriately  ask  why  this  is  necessary. 
The  reason  is  that  the  usual  procedure  has  produced  new  values  of 
the  parameters  which  are  not  sufficiently  dose  to  the  initial 
ones  and  haa  civen  rise  to  larger  values  of  the  sum  of  squares  of 
the  residuals  (the  error  sum  of  squares)  than  that  corresponding 
to  the  Initial  solution. 

Let  h(x,y,s,... j*»£>  *».••)  be  the  function  to  be  approximated 
or  the  actual  true  function  in  nature  and  E(x,y,z,.». ;a,b,o,...) 
be  the  approximating  function,  where  a,  b,  c,...  are  the  least 
squares  estimates  of  the  unknown  parameters  4,  p,  Xt ...  and  x,  y, 
z,*«.  are  the  observed  variables.  Then  the  residuals  at  the 
points  d^v^z^,...)  1  ■  1t2»...m  are 

f ^(*i0 •«»••• )  = 
H(xi,yifZi,...;a,b,ct...)  -  M*uflfzU •••  *>**?*  *••••)• 

The  least  squares  criterion  requires  minimization  of  the  expression 
in  (3.1). 

m    « 
(3.1)  s(a,b,c,...)  =   Hfi 

1*1 

Choosing  an  initial  solution,  p0  a  (a0,b0,co,..»),  at  which 

it  is  assumed  s  does  not  have  a  stationary  value,  the  first  order 

Taylor  expansion  of  the  residuals  taken  about  p0  giveB  a  set  of 

linear  approximations  (3.2)  to  the  residuals,  where  Aa  =  a  -  a0, 
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Ab  b  b  •  b0,...,  and  the  partial  derivatives  are  evaluated  at  p0, 

(3.2)     fjilx,^,/,...)  b  F^a^c,...)  s  *i(p©)  ♦  ^fjAa  ♦  ^i^b  +... 

da  TIT 

The  Gauss-Norton  method  then  requires  the  minimising  of  S 

given  in  (3*3)  by  setting  the  partial  derivatives  of  S  with 

respect  to  the  various  parameters  equal  to  zero* 

m  ^ 
(3-3)  S(a,b,o,...)  a  UK 

is!  x 

The  system  (3.4)  is  obtained  where  the  bracket  []  symbol  stands 
for  summation  of  the  partial  derivatives  e.g.  [as]  s  JZ  l-r^j    » 

M  -  £  (M1  •  $  •  H  *  £  (ft!  •  Flj  •  »  l»  «"««..* 

that  these  partial  derivatives  are  evaluated  at  the  starting 
point  of  eaeh  iteration. 

(3.4)     1  £&  »    [aa]  Aa  ♦    [ab]  Ab  +  ...  ♦    [an}  An  ♦    [ao]    a  0 
2  aa 

1   33  s    [ba]  Aa  ♦    jbb)  Ab  +  •  •  •  +    (bnj  An  ♦    jbO]    b  0 


1  iil  s    [na]  Aa  t    [hb]  Ab  +  •  •  •   -f    (hnj  An  +    [ho]    sj  0 
z  dn 

One  should  note  that  large  values  of  the  Increments  Aa,  Ab,... 
obtained  by  solving  the  above  normal  equations  (3.4)  may  Introduce 
a  large  error  Into  the  first  order  Taylor  approximation  of  the 
next  oyde  such  that  a  decrease  in  3,  defined  In  equation  (3.3) # 
may  not  correspond  to  a  deorease  In  8  (3.1).   In  such  cases,  it 
vould  seem  advisable  to  limit  or  damp  the  absolute  values  of  the 
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increments  of  the  parameters  in  order  to  improve  on  the  first 
order  Taylor  approximation  and  to  minimise  simultaneously,  if 
possible,  the  sum  of  the  squares  of  the  approximating  residuals 
under  these  damped  conditions.  In  order  to  make  the  Increments 
and  the  residuals  small  in  absolute  value,  one  can  apply  the 
idea  of  least  squares  to  the  expression  (3*5)  where  A,  B,  C,... 
is  a  set  of  positive  constants  or  weighting  factors  expressing 
the  relative  importance  of  damping  the  different  increments,  and 
w  is  a  positive  quantity  indicating  the  relative  importance  of 
the  residuals  and  increments  in  the  minimising  process. 

(3.5)  3(a,b,o,...)  a  wS(*,b,e,...)  ♦  A(aa)2  ♦  B(z\b)2  ♦  C(Ac)2  ♦  ... 

Denote  the  point  at  which  3  taxes  its  minimum,  for  a  given  posi- 
tive value  of  w,  by  p^  =  (a^b^c^,,..)  and  set  Q(a,b,c,...)  * 
A(aa)2  ♦  B(Ab)2  ♦  C(ac)2  ♦  .... 

Under  the  assumption  that  s  is  not  stationary  at  pQ,  one  can 
obtain  the  following  inequality: 

wS(pw)  <  wS(pv)  ♦  Q(pw)  =  S(pw)  <  S(po)  *  W3(p0)  ♦  Q(po)  si  wS(p0). 

It  follows  then  that  Sj^)  is  less  than  S(p  ),  and  this  shows  that 
the  minimisation  of  (3.5)  will  diminish  the  sum  of  squares  of  the 
approximating  residuals  3. 

By  denoting  the  standard  least  squares  solution  to  (3.4)  by 


P<»»  the 

following  inequality  can 

be  obtained: 

wS(p|r)  , 

►  Q(p„)  ■  s(pw)  <  s(vj   * 

wS(p  ) 

♦  Q(P*)  <  v3(Plr)  ♦  Q(pa 

,>• 

It  follows  then 

that  Q(pv)  1b  lei 

38  than 

J  which  shows 

that 

the 
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Increments  given  by  the  standard  least  squares  solution  of  (3*4) 
will  be  improved  in  the  sense  that  the  weighted  sun  of  their 
squares,  Q,  will  be  reduced. 

The  normal  equations  resulting  In  minimising  (3*5)  are  nearly 
the  sans  as  the  system  (3.4) •  The  partial  derivatives  are  as 


follows:  £|  «  w  ££  ♦  21  Aa,  £g  »  w  «>S,  ♦  2B  Ab,.... 

After  dividing  these  partial  derivatives  by  2w,  the  resulting 
system  can  be  simplified  to  the  following: 

( JaaJ    «  &)Aa  ♦  jab]  Ab  ♦  ...  «f    [anjAn  ♦    jacj   =  0 

w 

|ba|na  ♦   {  |bb|    +  Bjab  ♦  •♦•  *    ]bn|An  +    [bej    »  o 
•  w 


JmjjAa  +       [.]  A  +  •••  4  ( &n]  •  li;A  ♦  £  ]  -  0 

This  system  is  the  same  as  (3.4),  except  for  the  coefficients  of 
the  principle  diagonal  elements,  which  are  increased  by  quantities 
proportional  to  the  weighting  factors  A,  B,  C,...,  respectively. 
Cue  is  now  able  to  see  that  by  lotting  w  become  infinite,  one 
obtains  the  regular  normal  equations  and,  hanoe,  the  reason  for 
the  notation,  p^,  as  the  ordinary  least  squares  solution.  This 
method  will  be  referred  to  as  "damped  least  squares •" 

It  has  been  proven  above  that  the  sum  of  squares  of  the 
approximating  residuals  can  be  reduced  by  this  technique;  however, 
it  remains  to  be  shown,  that  the  sum  of  squares  of  the  true  resid- 
uals, s,  can  be  diminished.  This  is  done  by  showing  that  the 
derivative  of  s  with  respect  to  w  is  negative  at  some  point.  It 
can  be  shown  (8,  p.  167)  that 
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=  -2 -((kin2  +  M2 


Gad]   +  M  '  +  ...  +   M 
B  T 

which  shows  that  the  derivative  of  s  with  respect  to  w  Is  nega- 
tive at  w  =  0,  since  the  fact  that  not  all  the  partial  derivatives 
are  zero  follows  from  the  assumption  that  s  is  not  stationary  at 
the  point  p  ,  and  that  A,  B,  C,...  are  positive  constants.  All 
this  assures  one  that  there  is  a  value  of  w  which  can  be  found 
for  which  the  sum  of  squares  of  the  true  residuals  will  be  reduced. 

The  best  value  of  w  to  use  may  be  determined  in  theory  by 
solving  the  following  equation: 

-w- 

However,  due  to  the  generally  complicated  nature  of  this  equation, 
a  first  order  Taylor  expansion  is  generally  substituted,  giving 
the  following: 

3(pw)  =  *<p0)  »  0  -  O  (gj  ^  . 

By  assuming  tvat  p  was  chosen  so  that  the  decreased  value  S(pv) 
Is  small,  one  can  obtain  the  following  equation  for  w. 

w  »  - 


aw/  w=o 


I.evenberp  (R,  r>.  167)  ears  that  this  value  of  w  may  be  Improved 
won  by  calculating  8(pv)  for  several  different  trial  values  of 
w,  so  that  the  approximate  minimum  may  be  located  graphically. 
One  with  a  little  experience  will  usually  be  able  to  get  a  good 
idea  of  the  general  order  of  magnitude  of  the  best  value  of  w. 
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especially  in  connection  with  fitting  a  particular  function  H, 

after  graphing  only  a  fair  points* 

One  should  note  that  the  coefficients  of  the  weighting  system 

A9  B,  0,...  have  been  left  arbitrary,  the  only  restriction  being 

that  the  weighting  factors  be  positive*  If  the  criterion  that 

the  directional  derivatives  of  s,  taken  at  w  a  0  along  the  curve 

a=a.b=b,...,  should  have  their  minimum  value,  namely,  the 
o      o 

negative  gradient,  then  the  result  is  that  the  factors  A,  B,  0,... 
are  all  equal.  One  can  see  by  equation  (3.5)  that  no  generality 
is  lost  by  letting  them  all  equal  unity.  For  this  weighting  sys- 
tem, the  formation  of  the  damped  normal  equations  may  be  thought 
of  as  being  accomplished  simply  by  the  addition  of  a  positive 
oonstant,  1/w,  to  the  coefficients  of  the  principal  diagonal  of 
the  standard  normal  equations.  Another  weighting  system  which 
has  been  used  successfully  is  A  *  [aa]  ,  B  a  bb  ,...,  N  =  [hnj  ; 
in  this  case  the  damped  normal  equations  are  formed  by  multiplying 
the  coefficients  of  the  principal  diagonal  of  the  standard  normal 
equations  by  a  oonstant  faotor  greater  than  unity,  namely,  1  ♦  l/*« 

Since  there  is  generally  a  tendency  to  go  beyond  the  desired 
adjustment  (6,  p.  227) »  there  has  been  introduced  in  practice  the 
idea  of  damping  the  adjustments,  say  to  seventy-five  per  cent  of 
the  least  squares  estimates.  This  is  done  to  every  parameter 
which  one  is  estimating.  The  results  of  this  technique  are  some- 
times improved  in  that  the  speed  of  convergence  is  faster,  but  in 
other  oases  the  opposite  is  true.  Such  a  process,  then,  would  not 
be  recommended  for  a  multi-purpose  high  speed  digital  computer 
program. 
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In  this  method  of  damped  least  squares,  one  is  not  required 
to  decide  on  an  arbitrary  pre-assigned  procedure  restricting  all 
the  variables  to  the  same  extent*  It  seems  that  it  is  the  greater 
degree  of  freedom  given  by  this  method  to  the  individual  variables 
that  accounts  for  the  fact  that  this  method  has  solved  (3,  p.  168) 
types  of  problems  which  are  of  much  greater  complexity  than  those 
to  which  the  principle  of  least  squares  is  ordinarily  applied. 
The  rate  of  convergence  has  been  sufficiently  fast. 

Almost  simultaneous  with  Levenberg's  publication  was  o  it  (3) 
published  by  Haskell  B.  Curry,  also  of  Prankford  Arsenal,  concern- 
ing non-linear  estimations  by  using  the  method  of  steepest  descent. 
His  method  has  a  much  more  theoretical  basis  but  yields  approxi- 
mately the  same  results  and,  therefore,  only  brief  reference  is 
made  to  it  in  this  report.  There  has  been,  however,  a  more  reoent 
development  in  the  field  of  estimating  non-linear  parameters,  and 
the  Intuitive  notion  that  the  adjustments  tend  to  over-correct 
seems  to  be  included. 

Dr.  H.  0.  Hartley,  a  professor  at  Iowa  State  University,  has 
developed  (6,  p.  269)  another  modification  of  the  standard  Gauss- 
Hewton  method  for  estimating  non-linear  parameters  by  using  the 
principle  of  least  squares.  This  modification  follows  the  standard 
method  until  one  finds  a  first  solution  to  the  normal  equations. 
At  this  point  in  the  process,  one  can  define  a  function  of  v  as 
follows : 

S(v)  ■  3(x,  e0  ♦  vD),  0<  v  <  1. 

The  x  represents  the  observed  variables,  eo  represents  the 
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parametric  variables  evaluated  at  the  starting  point,  and  D  repre- 
sents the  least  squares  solutions  just  obtained  from  the  normal 
equations.  Hence,  60  +  vD,  where  v  is  a  scalar,  represents  the 
parametric  variables  evaluated  at  their  starting  values  plus  a 
part  of  the  adjustment  indicated  by  the  least  squares  solution. 
If  one  then  denotes  by  v'  the  value  of  v  for  whioh  S(v)  is  a 
minimum  on  the  unit  interval  of  v  and  defines  8^  ■  6  +  v*D,  one 

can  conclude  that  5(x,8, )  <  S(x,©  ).  One  can  then  repeat  the 

l        o 

process  by  starting  with  a  Taylor  expansion  about  6j  and  will  be 
assured  convergence  to  the  minimum  under  the  assumption  that  e0 
is  not  a  stationary  in  S.  The  process  up  to  this  point  seems  to 
be  quite  similar  to  the  one  developed  by  Levenberg. 

Hartley  proves  (6,  p.  272)  that  under  the  given  set  of  condi- 
tions; first,  the  functions  are  continuous;  seoond,  the  coefficient 
matrix  has  full  rank;  and  third,  there  exists  a  starting  vector  60 
in  the  interior  of  a  bounded  convex  set  T  such  that  3(x,60)  <  5 
where  5  equals  the  lim  inf  S(x,9)  on  7,  the  oompliment  of  f;  then 
by  using  Sfv1)*  one  will  obtain  in  the  limit  a  unique  minimum. 
However,  the  worth  of  this  modification  and  a  significant  differ- 
ence from  the  previous  methods  seem  to  lie  in  the  fact  that  one 
need  not  find  this  v*  exaotly  but  can  use  a  relatively  easily 
obtained  approximation  to  it,  say  v_,  as  given  in  (3.6). 

(3.6)     vp  *  *  ♦  i[[s(0)  -  3(1)]  /  [3(1)  -  2S(£)  «■  3(0)]] 


This  is  nothing  more  than  finding  the  sum  of  squares  of  residuals 
at  (60  ♦  |D)  which  is  equal  to  3(|)  and  then  finding  the  v  co- 
ordinate of  the  horlsontal  tangent  to  a  "parabola"  fitted  through 
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the  three  points  [b,  S(o}]  ,  [J,  S(£)]  .and  [1,  3(1)]  In  a  [V,  S(v)] 
space.  Then  one  is  ready  to  repeat  the  process  using  G1  =  8Q  +  v  D 
as  the  starting  vector. 

Consider  now  a  numerical  example  of  the  modified  Gauss-Newton 
method.  The  data  of  Table  3  has  been  taken  from  a  fertilizer 

Table  3.  Data  from  Fertilizer  Experiment 


X 

y 

-5 

127 

-3 

151 

-1 

379 

1 

A21 

3 

460 

5 

426 

experiment  in  which  there  were  six  responses  representing  the 
yields  of  wheat  corresponding  to  six  rates  of  application  of  ferti- 
lizer which,  on  a  ooded  scale,  are  given  the  values  in  the  table. 
It  is  intended  to  fit  this  data  to  a  curve  representing  the 
exponential  law  of  the  following  form. 

Kx 

f  s  %  +  Be 

This  equation  is  sometimes  called  Kitoherlisch's  Law  of  Diminishing 
Returns  in  which  x  is  the  only  input  variable.  L  is  the  asymptotic 
yield  for  large  rates  of  fertilizer  application,  K  is  the  exponen- 
tial rate  of  the  response  decrease,  and  S  defines  the  mid-point 
response  at  x  s  0  by  giving  it  the  value  I  ♦  B.  For  this  example, 
choose  the  following  starting  values. 

I0  a  580       B0  ■  -180       Kq  ■  -.160 
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The  partial  derivatives  needed  for  (2.4)  are: 

^L  ^B  ^K 

The  normal  equations  are  exhibited  in  Table  4  vhere  Dj  »  L  -  L0, 
D^  ss  B  -  B  -  and  D,  as  K  -  K  • 

Table  4.  The  formulas  for  the  coefficients  in  the  normal  equa- 
tions of  Mltcherllseh's  lav. 


D,  ]fe  D3 

T7°*       B0zre^x         n<y~yo> 


n 


&M     £.2*o*    BoIxe2KoX     £(y-y0)sK°x 
Ixe3^    B0Zxe2Kox    B2Ix2e2K°x   B^y-y^e1** 

The  evaluation  of  the  entries  of  Table  4  is  given  in  Table 
59  and  the  solution  of  the  normal  equations  1st 


Dt   ar  -89.68                 D2 
Table  5.     Coefficients  In  the 

s  58.89                     D,  a  -.06312 

normal  equations  for  cycle  1 

•) 

% 

VBo 

6. 

6.935 
-12.194 

6.935 

10.253 

-31.093 

-12.194              -267.634 
-31 .093              -370.782 
157.927              1055.627 

The  next  stage  is  to  find  the  minimum  of  S(v)  on  the  unit 
Interval  of  v.  One  can  use  the  approximating  parabola  corresponding 
to  (3.6)  to  find  the  minimum.  The  error  sums  of  squares  are 

S(O)  =  27376.8     S(£)  m   17400.7     3(1)  =  14586.0, 
and  it  follows  that  v  ■  .9965.  To  find  the  next  trial  values  of 
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the  parameters,  one  replace*  I  by  I0  ♦  v  Ej  and  the  other  para- 
meter* hy  elailar  expreesioni.     The  following  values  are  obtained, 

It   s  495.2  Bt   m  -124.3  K«   «  -.2197 

These  and  further  result*  are  listed  In  Tshle  6, 


Table  6,     Convergence  of  least  squares 

estimates 

In 

four  cycles. 

I 

B 

I 

3(0) 

Cycle  1 
2 

I 

580,00 

495.21 

524. 

5H. 

523.3 

-180.00 
-124,26 
-159.27 
-152.49 
-157.0 

-,16000 
-.21974 
-.19065 
-.20354 
-.1994 

27376. 
14590. 
13639. 
13394. 

FURTHER  MODIFICATIONS 

The  modified  Gauss-Newton  method  at  developed  by  Hartley  and 
at  least  hinted  at  earlier  by  G.  B.  F.  Box  (6,  p.  280)  is  one 
method  of  estimating  non-linear  parameters  in  regression  analysis 
that  is  particularly  well  suited  for  application  on  the  high  speed 
digital  computers.  In  fact,  Mr.  Carlton  Hassell  and  Dr.  Dale 
Cooper,  both  of  the  Mathematical  Kesearch  section  of  Continental 
Oil  Company,  have  worked  for  some  time  in  this  area  and  plan  to 
publish  their  results  ?»oon.  Their  program,  oalled  HOHLN,  was 
written  for  the  International  Business  Machines1  7090  and  is 
capable  of  solving  a  great  number  of  the  conceivable  practical 
problems.  It  uses  the  modification  given  by  Hartley. 

Presented  now  are  some  additional  ideas  of  interest  in  esti- 
mating non-linear  parameters  with  the  majority  of  the  results 
being  applicable  to  computers.  One  oan  modify  even  more  the 
Hartley  modification  of  the  Gauss-Newton  method.  One  of  the  best 
changes  in  technique  is  to  eliminate  the  need  for  having  a  start- 
ing value  of  any  of  the  linear  parameters.  One  could  fit  a  curve 
for  various  values  of  the  non-linear  parameters,  with  each  find 
the  corresponding  least  squares  estimate  of  the  linear  parameters 
and  then,  by  graphing  the  results,  find  the  value  of  the  non-linear 
parameters  where  one  should  hope  to  find  a  smaller  error  sum  of 
squares.  Bert  will  be  given  a  method  of  finding  the  adjustment  of 
the  non-linear  parameters  without  having  to  estimate  any  linear 
parameters  and  also  not  relying  as  much  on  a  trial  and  error 
procedure  as  the  graphing  procedure  just  described. 
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Hartley  considers  (6,  p.  274)  an  equation  of  the  form  (4.1) 
where  7  Is  the  response  variable  and  x  Is  the  Independent  variable, 

(4.1)  y  m  r  ♦  setx  ■  f(x). 

The  letters  r,  s,  and  t,  therefore,  stand  for  parameters  to  be 
estimated  but  only  r  and  s  are  Involved  linearly*  Hence  this  Is 
a  non-linear  estimation  problem.  Consider  then  the  first  order 
Taylor  expansion  of  (4.1)  given  in  (4.2)  and  simplified  in  (4.3). 

(4.2)  y  -  f0  ♦  (r  -  r  )  41  ♦  (■  -  ■  )  41  ♦  (t  -  t  )  It 

dT  JB  <)t 

(4.3)  y  *  f0  ♦  (r  -  r0)  #  (s  -  mQ)   etoX  ♦  (t  -  t0)  s0xetoX 

Since  the  partial  derivative  of  f  with  respect  to  r  does  not 
involve  the  variable  x,  one  can  consider  (4.3)  in  the  form  (4.4) 
where  B0,  Bj ,  and  B2  are  parametric  constants  being  solved  for  by 
the  general  method  of  least  squares. 

(4.4)  y  m  BQ  ♦  B,^0*  ♦  B2xetoX 

When  comparing  (4.3)  and  (4.4),  the  estimations  (4.5)  follow. 

(4.5)  Beiy0+  (r  -  r0)    B,  £  s  -  bq    B£   £  (t  -  tQ)  sq 

If  one  will  now  delete  the  last  term  of  (4.3),  the  equation 
is  of  the  form  (4.6)  which  Is  the  same  as  (4.1). 

(4.6)  y  >  b0  ♦  b,  etoX 

If  one  then  fits  the  same  data  to  (4.6)  as  fitted  to  (4.4), 
the  estimations  (4.7)  follow. 

(4.7)  b0  i  re       b,  I  s0 
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By  combining  the  last  approximations  in  (4*5)  and  (4.7) ,  one 
obtains  the  estimate  given  by  (4.8). 

(4,8)       B2  B2 

-£  s  t  -  t0      or      t  s  t0  ♦  — 

b1  b1 

This  adjustment  to  t  to  obtain  a  better  approximation  of  the 
true  value  of  the  non-linear  parameter  t  has  been  done  without 
assuming  any  particular  value  for  any  of  the  linear  parameters* 
The  corresponding  values  of  b0  »  tq  and  v  =  s  are  the  least 
squares  estimates  of  those  parameters  r  and  s  which  go  with  the 
particular  value  t  of  t  which  has  been  used. 

The  generalisation  of  the  above  example  is  rather  straight- 
forward. 4  constant  term  in  f  causes  no  difficulty  because  In  the 
Taylor  expansion  it  can  be  incorporated  into  BQ  of  (4.4),  and  it 
is  estimated  by  a  regular  least  squares  method  in  (4.6).  The 
coefficient  of  a  term  is  taken  oare  of  when  it  is  estimated  in 
(4.6).  The  adjusted  value  of  a  non-linear  parameter  is  obtained 
as  in  (4.8). 

If  two  non-linear  parameters  arc  assumed  as  In  the  following 
equation, 

tfX        tgX2 

ysr  +  SjO    ♦  s  e  *  , 

then  one  needs  only  to  add  the  corresponding  partial  derivatives 
to  (4.3)  and  the  corresponding  term  to  (4.6)  before  making  the 
fittings.  However,  if  two  non-linear  parameters  are  involved  In 
one  term  in  such  a  way  that  they  cannot  be  combined  into  one  non- 
linear parameter,  then  this  technique  breaks  down  due  to  the 
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interaction  of  the  two  non-linear  parameters  and  the  need  to  esti- 
mate functions  of  the  observed  variables* 

The  step  of  deleting  a  term,  as  was  done  in  going  from  (4. 4) 
to  (4.6),  can  be  included  in  a  computer  program  very  easily*  One 
can  use  a  first  matrix  to  store  the  elements  of  the  normal  equa- 
tions of  all  the  terms  formed  and  then  use  a  second  matrix  in 
solving  the  normal  set  where  one  transfers  to  the  second  matrix 
only  the  terms  of  present  interest*  Since  all  terms  are  still 
stored  in  the  first  matrix,  one  can  obtain  all  combinations  that 
might  be  desired* 

If  one  examines  the  v  (3*6),  he  can  find  some  interesting 
ideas  that  should  be  considered  if  he  is  to  program  a  high  speed 
computer  to  solve  non-linear  estimation  problems*  Even  if  one  is 
attacking  this  lengthy  problem  by  hand,  an  aid  here  and  there  will 
help*  Since  one  is  considering  how  much  of  the  suggested  least 
squares  adjustment  should  really  be  taken,  one  is  interested  in 
the  error  sum  of  squares  as  v  ranges  from  zero  to  one*  Since  the 
approximating  curve  generally  used  is  a  parabola  through  S(O), 
S(i),  and  3(1)  where  3(0),  the  sum  of  squares  of  residuals  at  the 
starting  point,  and  S(1),  correspondingly  the  sum  at  the  full 
adjustments,  are  already  evaluated*  S($)  is  then  computed*  Tor 
the  minimum  of  3(v)  to  be  on  the  unit  interval  of  v,  and  where 
A~>  =  S(1)  -  S(0),  the  value  of  S(£)  must  satisfy  inequality  (4*10)* 

(4.10)  S(i)£*[s(0)  ♦  3(1)]  -  *|&s| 

£he  inequality  (4*10)  implies  that 

|s(£)  -  S(0)|£  iUs|     or      S(i)<  S(0). 
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This  leads  to  an  answer  to  the  question  of  why  this  evalua- 
tion should  be  included  in  a  multipurpose  non-linear  program* 
Since  equation  (3.6)  furnishes  merely  a  formula  for  finding  the 
abscissa  of  the  horizontal  tangent,  the  graph  of  the  equation  may 
have  a  maximum  at  v  ;  and  if  the  following  inequality  is  satisfied, 


S(i)  >  *[s<0)  +  8(1)]  ♦  iJAs\, 


then  the  value  of  v  obtained  will  be  in  the  unit  interval  of  v  and 
will  correspond  to  a  maximum  for  the  error  sum  of  squares.  If  one 
is  attempting  to  find  a  minimum,  then  any  further  evaluation  would 
be  a  waste  of  time.  Also,  if  S(&)  is  below  the  average  of  3(0)  and 
3(1)  but  does  not  satisfy  (4.10),  then  the  horizontal  tangent  found 
by  using  (3*6)  will  correspond  to  a  minimum;  it  will  be  located 
outside  the  unit  interval  of  v  and,  hence,  outside  of  the  domain  of 
definitions  of  S(v). 

One  might  appropriately  ask  if  all  of  this  is  really  relevant 
to  the  problem  or  whether  these  situations  actually  occur*  The 
answer  is  yes;  they  happen  often  and  in  two  particular  places— 
the  beginning  and  the  end*  The  occurrence  at  the  beginning  is 
usually  due  to  very  poor  starting  values  resulting  in  divergence* 
A  quick  check  of  this  tendency  would  allow  one  to  relinquish  the 
computer  and  examine  this  particular  problem  more  closely* 

After  running  only  a  few  iterations  of  this  modified  Gauss- 
Newton  method,  one  will  generally  be  converging  upon  an  answer* 
Usually  within  only  four  or  five  iterations,  even  the  accuracy  of 
the  machine  begins  to  play  a  part*  On  one  fitting  which  was  run 
for  nine  cycles,  one  could  see  that  a  step  function  had  been 
obtained  for  the  error  sum  of  squares*  This  was  attributed  to 
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the  round  off  error  resulting  from  doing  the  computation  with  only 
eight  significant  digits.  Ouch  errors  are  the  cause  of  trouble  at 
the  end  of  the  converging  process;  but  since  one  does  want  the  best 
fit  available,  it  is  worthwhile  to  continue  as  long  as  the  sum  of 
squares  of  residuals  is  decreasing*  When  the  process  does  give  a 
larger  value  at  3(1)  than  at  S(0)t  and  S(&)  is  computed,  then  by 
examining  inequality  (4.10),  one  can  tell  whether  to  continue  or 
to  be  satisfied  with  the  acquired  approximation. 

Thus  far  in  this  section,  consideration  has  been  givon  to 
techniques  useful  to  computers  in  estimating  the  value  of  non- 
linear parameters  by  the  modified  Gauss-Newton  method  as  developed 
by  Hartley  (6t  p.  263),  There  is  one  aid  for  non-linear  estimation 
that  is  applicable  to  the  general  Gauss-Newton  method  and  not  to 
the  modified  method* 

If  the  regular  Gauss-Newton  method  gives  a  first  adjustment 
to  a  non-linear  parameter  to  be  "add  1.00"  and  on  the  second  itera- 
tion gives  an  adjustment  of  "subtract  0*40,"  the  percentage  that 
the  second  adjustment  is  of  the  first,  disregarding  the  sign,  is 
40,  in  practice  it  is  found  that  this  percentage  which  the 
(n  +  1)   adjustment  is  of  the  n   is  approximately  constant* 
Assuming  that  the  ratio  is  constant  leads  one  to  an  infinite  geo- 
metric series  which  has  an  easily  calculated  sum  given  by  the 
following,  (4.11). 

(4*11)  Sum  *    a 

(1  -  p) 

The  letter  a  stands  for  the  value  of  the  first  adjustment  and 
r  for  the  common  ratio.  Prom  this,  one  is  able  to  find  after  only 
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tiro  iterations  the  value  of  parameters  which  ordinarily  would  have 
taken  eight  or  ten  cycles  to  find*  It  would  he  advisable,  however, 
at  least  until  more  theory  or  research  is  done  on  this  technique, 
to  check  the  value  of  the  error  sum  of  squares  at  this  new  point. 
This  is  not  added  work  since  one  is  generally  seeking  the  value  of 
the  error  sum  of  squares  at  this  point.  This  is  a  simple  aid  to 
help  find  a  better  approximation  faster  than  the  standard  Gauss- 
Newton  methods  would  find  it. 

In  practice,  r  is  usually  negative,  which  means,  as  illus- 
trated in  the  accompanying  graph,  that  the  tendency  is  to  "over- 
adjust." 
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DIAGRAM  OF  ADJUSTMENTS  ON  THE  NON-LINEAR  PARAMETER  t 


This  characteristic  of  over-correction  corresponds  with 
Hartley's  principle  of  finding  the  minimum  of  S(v)  on  the  unit 
interval  of  v.  If  the  expected  value  of  the  sought  for  non-linear 
parameter  were  outside  this  unit  interval,  then  one  would  need  to 
extend  the  domain  of  S(v)  to  the  whole  real  line;  and  this  is  just 
exactly  the  opposite  of  what  has  been  sought.  Most  of  the  diffi- 
culties that  have  occurred  in  non-linear  estimation  by  the  standard 
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Gauss-Newton  method  have  been  associated,  with  too  great  an  adjust- 
ment on  the  parameters  thus  ruining  the  first  order  approximation 
obtained  by  Taylor  series  and  resulting  in  an  increase  in  the 
error  sum  of  squares. 

Mentioned  earlier  was  the  fact  that  this  geometric  series 
(4.11)  is  not  formed  by  the  modified  Gauas-Hewton  method.  The 
reason  is  that  the  process  of  finding  the  minimum  on  the  unit 
interval  of  v  does  not  retain  the  seemingly  inherent  property  of 
the  Taylor  series  expansion  allowing  the  ratio  of  successive 
adjustments  to  the  parameters  to  remain  constant.  If  a  geometric 
series  were  formed,  this  would  mean  that  the  minimum  is  always  the 
same  percentage  of  the  distance  from  the  starting  value  to  the 
adjusted  value. 

The  application  of  this  process  to  a  computer  is  not  of  too 
much  value  for  two  reasons.  The  first  is  that  the  modified  method 
is  not  too  much  more  difficult  to  program,  and  it  speeds  up  the 
convergence  from  requiring  ten  oyoles  down  to  as  few  as  four. 
Second,  one  iteration  of  hOhm   on  the  I.B.K.  7090  requires  approx- 
imately four  seconds. 


APPSNDIX 

Part  At  Development  of  the  Normal  Equations 

Let  yt  be  the  observed  ordinate  and  T^  be  the  calculated 

approximation  to  y^.   Therefore, 

n 
^1   ■  Yi  =  ^a4xi3J      (1  e  1,2,.,,fm). 

Let  the  residuals,  Eit  be  defined  as  follows 

n 
3^  ■  H&jX^  -  y^      (i  =  1,rt...,m). 

Hence, 

JSBl  JSSI   Kb!  J=l 

for  i  =  1t2f#..tm;  and 

m  o 
P  «  EE, 
i=1^ 

Considering  the  first  partial  derivative  of  P  with  respect 

to  aj ,  one  finds 

which  simplifies  to 

tf  ■  *£  £  ***ii*ik  •  *£  *un* 

Ja^    isl  fc=1  iel 

One  finds  that  by  setting  the  n  first  order  partial  deriva- 
tives--^ #  j  =  1,2f..,fn— to  zero,  the  following  system  of 

equations  must  be  satisfied: 
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This  system  of  equations  is  called  the  system  of  normal  equations* 
For  a  review  of  what  is  necessary  to  make  certain  that  the  solution 
of  the  above  system  is  a  minimum  of  F,  one  can  read  in  Brand's 
"Advanced  Calculus'1  on  page  188* 

Part  Ss  a  method  used  on  computers  for  forming  the  Normal  Equations, 

Consider  the  two  parameter  family  of  straight  lines  y  =  o  +  bx 
for  which  five  observations  have  been  made,  but.  where  co-ordinates 
do  not  actually  lie  on  a  straight  line.  Assume  the  resulting 
equations  from  the  observations  to  be 

(1)  o  ♦  1.0b  =  .9 

(2)  c  ♦  1.9b  a  3.0 

(3)  e  ♦  2.6b  •  4.0 

(4)  c  ♦  3.2b  =  5.5 

(5)  c  ♦  4.0b  &  6.9. 

Form  the  two  normal  equations  in  the  following  manner.  Add 
the  five  equations  to  form  the  first  normal  equation.  It  would 
be  5c  ♦  12.7b  =  20.3.  The  first  normal  equation  can  be  formed  in 
this  way  only  when  there  is  a  constant  term  involved,  as  o  is  in 
this  example.  If  there  is  not  a  constant  term,  one  forms  the  first 
normal  equation  in  a  manner  similar  to  the  others.  The  second 
normal  equation  can  be  formed  by  first  multiplying  (1)  by  1.0, 
(2)  by  1.9,  and  so  on,  multiplying  each  equation  by  the  coefficient 
of  b  in  that  equation  and  then  adding  together  these  computed 
multiples  of  the  original  equations.  The  result  is  12.7c  ♦ 
37.61b  =  62.?. 
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If  there  vere  more  parameters,  the  corresponding  normal  equa- 
tions would  be  formed  In  a  manner  similar  to  that  of  forming  the 
seoond. 
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The  purpose  of  this  report  has  been  to  collect  end  organize 
some  of  the  better  methods  of  dealing  with  problems  in  regression 
analysis  that  are  related  to  non-linear  parameters ,  including  some 
of  the  techniques  that  are  used  in  Industry  which  have  not  yet 
found  their  way  into  publications* 

Since  the  least  squares  method  of  estimation  is  basic  to  the 
methods  given  in  this  report,  its  detailed  development  has  been 
included  in  the  appendix.  This  makes  it  available  as  needed  without 
breaking  the  continuity  of  the  report. 

Due  to  the  extensive  length  of  non-linear  estimation  problems 
and  the  increasing  availability  of  the  high  speed  digital  computers, 
the  majority  of  techniques  that  are  included  in  this  report  are 
those  which  have  applications  suitable  to  programs  for  digital 
computers  and  require  fairly  simple  programing.  It  should  be  noted 
that  regression  analysis  in  general  is  no  easy  problem. 

The  procedure  has  been  to  illustrate  the  need  for  and  the 
development  of  the  basic  method  of  estimating  non-linear  parameters 
called  the  Gauss-Newton  Method.  The  theory  of  two  of  the  modifi- 
cations that  have  proven  their  worth  in  applications  are  given. 
Some  examples  have  been  Included  to  illustrate  the  processes  in 
action. 

The  first  of  the  methods  considered  is  called  "damped  least 
squares"  and  was  developed  by  Mr.  K.  Levenberg  of  the  Prankford 
Arsenal  and  published  in  1944.  This  method  has  solved  some  of  the 
more  complicated  problems  in  non-linear  estimation  and  has  a 
relatively  fast  rate  of  convergence*  It  seems  that  one  needs 
considerable  experienoe  in  order  to  set  up  the  problem. 
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The  second  method  is  attributed  to  Dr.  H.  0*  Hartley  of  Iowa 
State  University  and  was  published  in  1 961  under  the  name  of  the 
Modified  Gauss-Newton  Method.  The  applications  of  this  method, 
when  considered  in  the  light  of  some  numerical  short  cuts,  seem  to 
be  usable  on  the  digital  computers. 

With  the  help  of  a  technique  used  by  Dr.  Dale  Cooper,  of  the 
Continental  Oil  Company,  this  second  method  is  simplified  even 
more  to  the  point  where  a  starting  estimate  of  the  linear  para- 
meters is  not  required.  This,  In  a  sense,  reduces  the  number  of 
approximations  that  are  used  in  the  equations. 


